A Regularized Compression Method to Unsupervised Word Segmentation
نویسندگان
چکیده
Languages are constantly evolving through their users due to the need to communicate more efficiently. Under this hypothesis, we formulate unsupervised word segmentation as a regularized compression process. We reduce this process to an optimization problem, and propose a greedy inclusion solution. Preliminary test results on the Bernstein-Ratner corpus and Bakeoff-2005 show that the our method is comparable to the state-of-the-art in terms of effectiveness and efficiency.
منابع مشابه
An improved MDL-based compression algorithm for unsupervised word segmentation
We study the mathematical properties of a recently proposed MDL-based unsupervised word segmentation algorithm, called regularized compression. Our analysis shows that its objective function can be efficiently approximated using the negative empirical pointwise mutual information. The proposed extension improves the baseline performance in both efficiency and accuracy on a standard benchmark.
متن کاملA Method for Body Fat Composition Analysis in Abdominal Magnetic Resonance Images Via Self-Organizing Map Neural Network
Introduction: The present study aimed to suggest an unsupervised method for the segmentation of visceral adipose tissue (VAT) and subcutaneous adipose tissue (SAT) in axial magnetic resonance (MR) images of the abdomen. Materials and Methods: A self-organizing map (SOM) neural network was designed to segment the adipose tissue from other tissues in the MR images. The segmentation of SAT and VA...
متن کاملUnsupervised Texture Image Segmentation Using MRFEM Framework
Texture image analysis is one of the most important working realms of image processing in medical sciences and industry. Up to present, different approaches have been proposed for segmentation of texture images. In this paper, we offered unsupervised texture image segmentation based on Markov Random Field (MRF) model. First, we used Gabor filter with different parameters’ (frequency, orientatio...
متن کاملUnsupervised Texture Image Segmentation Using MRFEM Framework
Texture image analysis is one of the most important working realms of image processing in medical sciences and industry. Up to present, different approaches have been proposed for segmentation of texture images. In this paper, we offered unsupervised texture image segmentation based on Markov Random Field (MRF) model. First, we used Gabor filter with different parameters’ (frequency, orientatio...
متن کاملNeural Regularized Domain Adaptation for Chinese Word Segmentation
For Chinese word segmentation, the largescale annotated corpora mainly focus on newswire and only a handful of annotated data is available in other domains such as patents and literature. Considering the limited amount of annotated target domain data, it is a challenge for segmenters to learn domain-specific information while avoid getting over-fitted at the same time. In this paper, we propose...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012